arXiv:1503.06876v2 [stat.ME] 30 Jan 2016 


Binary and Multi-Bit Coding for Stable 
Random Projections 


Ping Li 

Department of Statistics and Biostatistics 
Department of Computer Science 
Rutgers University 
Piscataway, NJ 08854, USA 
pingliOstat.rutgers.edu 


Abstract 

We develop efficient binary (i.e., 1-bit) and multi-bit coding schemes for estimating the scale 
parameter of a-stable distributions. The work is motivated by the recent work on one scan 
1-bit compressed sensing (sparse signal recovery) [H] using a-stable random projections, 
which requires estimating of the scale parameter at bits-level. Our technique can be naturally 
applied to data stream computations for estimating the a-th frequency moment. In fact, the 
method applies to the general scale family of distributions, not limited to a-stable distributions. 

Due to the heavy-tailed nature of a-stable distributions, using traditional estimators will poten¬ 
tially need many bits to store each measurement in order to ensure sufficient accuracy. Inter¬ 
estingly, our paper demonstrates that, using a simple closed-form estimator with merely 1-bit 
information does not result in a significant loss of accuracy if the parameter is chosen appropri¬ 
ately. For example, when a = 0-1-, 1, and 2, the coefficients of the optimal estimation variances 
using full (i.e., infinite-bit) information are 1, 2, and 2, respectively. With the 1-bit scheme and 
appropriately chosen parameters, the corresponding variance coefficients are 1.544, 7r^/4, and 
3.066, respectively. Theoretical tail bounds are also provided. Using 2 or more bits per mea¬ 
surements reduces the estimation variance and importantly, stabilizes the estimate so that the 
variance is not sensitive to parameters. With look-up tables, the computational cost is minimal. 

Extensive simulations are conducted to verify the theoretical results. The estimation procedure 
is integrated into the sparse recovery with one scan 1-bit compressed sensing. One interest¬ 
ing observation is that the classical “Bartlett correction” (for MLE bias correction) appears 
particularly effective for our problem when the sample size (number of measurements) is small. 


1 Introduction 

The research problem of interest is about efficient estimation of the scale parameter of the a-stable 
distribution using binary (i.e., 1-bit) and multi-bit coding of the samples. That is, given n i.i.d. 
samples, 

VjS{a,Aa), j = l,2,...,n (1) 

from an a-stable distribution ^(a, Aq,), we hope to estimate the scale parameter Aq, by using only 
1-bit or multi-bit information of \yj\. Here we adopt the parameterization |22l 119] such that, if 
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y ~ S{a,Ka), then the characteristic function is E = e . Note that, under this 

parameterization, when a = 2, S{2,A2) is equivalent to a Gaussian distribution A^(0, = 2 A 2 ). 

When a = 1, 5(1,1) is the standard Cauchy distribution. 


1.1 Sampling from a-stable Distribution 


Although in general there is no closed-form density of 5(a, 1), we can sample from the distribution 
using a standard procedure provided by [5]. That is, one can first sample an exponential w ~ exp{l) 
and a uninform u ~ unif{—7rl2,TTj2) , and then compute 


S/y - 


sin(Q!u) 

(cosu)^/" 


cos m — au) 


w 


{1 — 0l)/0L 


5(a,l) 


( 2 ) 


This paper will heavily use the distribution of |sq.|“: 

|sin(au)|“ rcos(u — au)^ (i-“) 


Is r — 


cos u 


w 


(3) 


Intuitively, as a ^ 0, 1/|sq,|“ converges to exp{l) in distribution as formally established by [7]. 


The use of a-stable distributions mm was studied in the context of estimating frequency 
moments of data streams mm- The use a-stable random projections for sparse signal recovery 
was established in (e.g.,) m, by using full (i.e., infinite-bit) information of the measurements. In 
this paper, the development of binary (1-bit) and multi-bit coding schemes is also motivated by 
the work recent work on “one scan 1-bit compressed sensing” |12j . 


1.2 One Scan 1-Bit Compressed Sensing 

In contrast to classical compressed sensing (CS) [8],[lj and 1-bit compressed sensing [3lfTO l fT8 l 1^. 
there is a recent line of work on sparse signal recovery based on heavy-tailed designs [mis]. The 
main algorithm of “one scan 1-bit compressed sensing” [12] is summarized in Algorithm [H Given 
n measurements yj = XiSij, j = 1 to n, where Sij ~ 5(a, 1) i.i.d. and Xi, i = I to N, is a 
sparse (and possibly dynamic/streaming) vector, the task is to recover x from only the signs of the 
measurements, i.e., sign{yj). Algorithm [1] provides a simple recipe for recovering x from sign{yj) 
by scanning the coordinates of the vector only once. 


Algorithm 1 Stable measurement collection and the one scan 1-bit algorithm for sign recovery. 

Input: IT-sparse signal x G design matrix S G with entries sampled from 5(a, 1) with small a (e.g., 

01 = 0.05). We sample Uij ~ uniform(—'K/2, 7r/2) and Wij ~ ea;p(l) and compute Sij by ([2]). 

Collect: Linear measurements: yj = XiSij, j = 1 to M. 

Compute: For each coordinate i = 1 to N, compute 

M M 

Qt = (1 + sgn{yj)sgn{uij)e-’^^-^^^'^^ , Q" = ^ log (l - sgn{yj)sgn{uij)e-‘^^-^^^'^^ 

( +1 if Q+ > 0 

Output: For i = 1 to N, report the estimated sign: sgn{xi) — < —1 if Q~ > 0 

[0 if Qf < 0 and Q~ < 0 


This efficient recovery procedure, however, requires the knowledge of “A”, which is the la norm 
1^*1'^ as a —)• O-h. In practice, this K will typically have to estimated and the hope is that 
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we do not have to use too many additional measurements just for the task of estimating K. In 
this paper, we will elaborate that only 1 bit or a few bits per measurement can provide accurate 
estimates of K (as well as the general term for 0 < a < 2). 

Because the samples yj are heavy-tailed, using traditional estimators, the storage requirement 
for each sample can be substantial, which consequently would cause issues in data retrieval, trans¬ 
mission and decoding. It is thus very desirable if we just need 1 bit or a few bits for each \yj\. 


2 Estimation of Aq^ Using Full (Infinite-Bit) Information 

Given n i.i.d. samples yj ~ S(q!,Aq,), j = 1 to n, we review various estimators of the scale 
parameter Aq, using full information (i.e., infinite-bit). When a = 2 (i.e., Gaussian), the arithmetic 
mean estimator is statistically optimal, i.e., the (asymptotic) variance reaches the reciprocal of the 
Fisher Information from classical statistics theory: 


^2,/ - 


Var ( A 2 J ) = —2 


When a = 1, the MLE Aij is the solution to the equation 


E 


At 


Et ^ 1 ,/ + 2/: 


n 

2 ’ 




n 


VarlAij) =-L2 + 0 


n 




The harmonic mean estimator [H] is suitable for small ol and becomes optimal as a —)• 0-|-: 


A, 


a,f,hm 


-|r(-a) sin (fa) 


EU \yj 


n — 


—7rr(—2a) sin (vra) 
. [r(-a)sin(fa)]^ 


- I 


VdV f ^a,f,hm ) 


/ —7rr(—2a) sin (yra) 
^ y [r(— a) sin (f a)]^ 


- 1+0 




A? 

where r(.) is the gamma function. When a —)• 0+, the variance becomes —^ + O (^)- 
In summary, the optimal variances for a = 0+, 1, and 2, are respectively 


A2 

^AO-I- 


1 , 


A? 


2 , 


and 


A2 


(4) 


(5) 

( 6 ) 
(7) 


( 8 ) 


n n n 

Our goal is to develop 1-bit and multi-bit schemes to achieve variances which are close to be optimal. 


3 1-Bit Coding and Estimation 

Again, consider n i.i.d. samples yj ~ 5(a,AQ,), j = 1 to n. In this section, the task is to estimate 
Aq, using just one bit information of each \yj\, with a pre-determined threshold. To accomplish this, 
we consider a threshold C (which can be a function of a) and compare it with |yjj“, j = 1,2, 

In other word, we store a “0” if \yj\°‘ < C and a “1” if \yj\°‘ > C. Note that we can express |yjj“ as 

|yjj“ ~ Aq |sq|“ , SQ~5(a, 1). 

Let fa and Fa be the pdf and cdf of |sq|“, respectively. Then we can define pi and p 2 as follows 

Pi = Pr {Za <C) = Fa (G/Aq) , 

P2 = Fy [Za> C) = I - Pi = I - Fa (C /Aa) 
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which are needed for computing the likelihood. Denote 

n n 

ni = ^ l{zj <C], ”2 = ^ > C] 

i=i i=i 

The log-likelihood of the n = ni-\-n 2 observations is 

I =ni log Pi n 2 logp 2 = ni log Fa (C/A^) n 2 log [1 - {C / ha)] 

To seek the MLE (maximum likelihood estimator) of A^, we need to compute the hrst derivative 
// _ ai . 

'' ~ dAc- 

fa{C/Aa) ( C\ -fa{C/Aa) ( C\ 

AjI+" n-F„(C/A„) V Al) 

Setting I' = 0 yields the MLE solution denoted by A^: 

F~^ (ni/n) = C/Aa^ Aa = C/F~^ {rii/n) 


To assess the estimation variance of Aq,, we resort to classical theory of Eisher Information, which 
says 


After some algebra, we obtain 


Var 



1 


-Fin 


+ 0 




E (I") 


SI 

A^ Fail - Fa) 


For convenience, we introduce V — we summarize the above results in Theorem[Tl which 

also provides the exact expression of the O (^) bias term using classical statistics results [a [20]. 


Theorem 1 Given n i.i.d. samples yj ~ <S'(a,AQ,), j = 1 to n, a threshold C, and ni = 
^ C'}, the maximum likelihood estimator (MLE) of Aa is 

Aa = C/F-^ (m/n) (9) 


Denote rj = The asymptotie bias of Aa is 


E{Aa]=Aa+^-^-il-- 


, vfM/r])\ 


Ag ni 

n n C n)\flil/ni) ' 2f(){l/r])J 


+ 


+ o(- 


and the asymptotic variance of Ag is 


A2 


Ear A, = (r?) + O ^ 


n 




where 


Vg (r/) = y 


2E„(1/p)(1 - Eail/y)) 


mvh) 

where fa and Fg are the pdf and cdf o/ |5(a, 1)|", respectively, and faiz) = 


( 10 ) 


( 11 ) 


( 12 ) 


Proof: See Appendix\M 


□ 
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3.1 OL —\ 0-|- 

As a —>■ 0+, we have 1 /[sq,]“ ~ exp{l). Thus 


= /„+(z) = = 


We can then derive the estimator and its variance as 

C 


Ao+ = 1 r , ' = Clogn/ni, Var (Ao+) = — Vb+(r/) + 0 

^0+ (ni/n) V / re \n^ 


where 


Vo+ (r/) =?7 


2F„(l/r7)(l - T„(l/r?)) _ e"'? - - 1 


The minimum Vb+ (h) is 1.544, attained at ?? = 1.594. (In this paper, we keep 3 decimal places.) 


3.2 a = l 

By properties of Cauchy distribution, we know 

-1 


TT 


2 2 1 

Fi{z) = - tan"^ 2, fi{z) = ^ = tan (-z 

TT vr 1 + 2 


Thus, we can derive the estimator and variance 

C 


Ai = 




Var (Ai) = —Vi{r}) + 0 


re 


re^ 


The minimum of Vi{r]) is attained at 77 = 1. To see this, let t = I/77. Then Vi (r/) = 
1 Fi(7)(l-Fi(t)) , 

^ im 

Slog ^1(77) ^ _ 2 ^ flit) ^ -fijt) _ 2/1 (i) 


dt 


t Fi{t) l-Fi(t) hit) 




‘^t T+F 


2 1 
TT 


t 1 + tan ^ t 1 — - tan“^ t 


t-l + 


1 + t2 


tan ^ — tan ^ t 


Setting ^ = 0; ths solution is t = 1. Hence the optimum is attained at 77 = 1. 

3.3 a = 2 

Since S'(2,1) ~ \/2 x ^"(0,1), i.e., |sq,P ~ 2xi, we have 

F2{z) = F^2iz/2), f2iz) = f^2{z/2)/2, 

where F ^2 and /^2 are the cdf and pdf of a chi-square distribution with 1 degree of freedom, 
respectively. The MLE is Ao = ^_i; —rr and the optimal variance of Ao is —3.066, attained at 

F 2 (ni/n) ^ 

r] = ^ = 0.228. 
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3.4 General 0 < a < 2 


For general 0 < a < 2 , the cdf and pdf fa can be computed numerically. Figure [T] plots Va{r]) 
for a from 0 to 2. The lowest point on each curve corresponds to the optimal (smallest) Va{if)- 
Figure [2] plots the optimal Va values (left panel) and optimal 77 values (right panel). 




Figure 1: The variance factor Va(rj) in (fT^ for a G [0, 2], spaced at 0.1. The lowest point on each 
curve corresponds to the optimal variance at that a value. 




Figure 2: The optimal variance values Va{ri) (left panel) and the corresponding optimal 77 values 
(right panel). Each point on the curve corresponds to the lowest point of the curve for that a as 
in Figure [H 


Figure [T] suggests that the 1-bit scheme performs reasonably well. The optimal variance coef¬ 
ficient Va is not much larger than the variance using full information. For example, when a = 1, 

the optimal variance coefficient using full information is 2 (i.e., see ([ 8 ])), while the optimal variance 

2 

coefficient of the 1-bit scheme is just ^ = 2.467 which is only about 20% larger. Furthermore, 
we can see that, at least when a < 1 , Va{r]) is not very sensitive to 77 in a wide range of rj values, 
which is practically important, because an optimal choice of 77 requires knowing Aq, and is general 
not achievable. The best we can hope for is that the estimate is not sensitive to the choice of 77 . 

3.5 Error Tail Bounds 
Theorem 2 

Pr (Aq > (1 + e)Aa^ 

Pr (a„ < (1 - e)Aa^ 


< exp —n 


GR,a,C,, 


< exp —n 


Gl. 


a,C,e 


, e > 0 
, 0 < e < 1 
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where Gji^a,c,e and GL,a,c,e are computed as follows: 


GR,a,C,( 


= - Fa{l/{1 + €)r]) log 


Fail/v) 


-(l-F„(l/(l + e)r/))log 


F„(l/(1 + e)r/) 

1 - F„(l/r?) 


GL,a,C,, 


= - Fail/{1 - e)ri) log 


l-F„(l/(l + e)r/) 

Fo.{lhl) 


- (1 - F„(l/(1 - e)r/))log 


F„(l/(l-6)r?) 

1 - F„(1/?7) 


(13) 


(14) 


1 - F„(l/(1 - e)ri)_ 


Proof: See Appendix\M The proof is based on Chernoff’s original tail bounds m for the binomial 

distribution. □ 


To ensure the error Pr 


> (1 + e)AQ,^ + Pr ^Aq, < (1 — e)AQ,^ <5, 0 < 5 < 1, it suffices that 


exp —n 


GR,a,C,( 


+ exp —n 


GL,a,C,t 


< 5 


(15) 


for which it suffices 


n> —^^l^log2/5, where (16) 

Ga,C,t = naa.yi{GR^a,C,t: G'L,a,C,e} 

Obviously, it will be even more precise to numerically compute n from ()15p instead of using the 
convenient sample complexity bound (I16p . Figure [3] provides the tail bound constants for a = 0+, 
i.e., G/j^o+,c,e and GLfl+,c,t at selected t] values ranging from 1 to 2. 



e 


Figure 3: The tail bound constants GR^o+,C',e ifl^ (upper group) and GLfi+,c,e (fl^ (lower group), 
for ?7 = 1 to 2 spaced at 0.1. Recall rj = 


3.6 Bias-Correction 

Bias-correction for MLE is important for small sample size n. In Theorem [H Eq. (jlOp says 


E|A„l=A„ + T!^ri-!^ 

n n 


n ^ V/a(l/h) ' 2/3(1/77); 


77 




+ O { — 
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which naturally provides a bias-correction for Aq,, known as the “Bartlett correction” in statistics. 
To do so, we will need to use the estimate Aq, to compute the rj. Since Aq, = C/F~^{ni/n), we 
have Aq/C = l/F“^(ni/n). The bias-corrected estimator, denoted by A^^c is 


^a,c —■ 


1 + 


1 ni 


(1-1^) 


Tj^ 

THW 






where fj = l/F^^{ni/n) 


(17) 


which, when a = 0-|-, a = 1, and a = 2, becomes respectively 

Clogn/ni 


Ao-r,c — 


Ai,c — 


1 + 


1/ni —1/n 
2 log n/ni 


tan 


C _ 

TT '’^1 


1 _L iiiim (i _ mi (1 j_1 

n A n n J \ tan2 ^ 


-7q- 


A2,c — 


C 

2F~^(ni/n) 

Xi 




F 2 (ni/n) 


— 1 e 


See the detailed derivations in Appendix together with the proof of Theorem [TJ 


(18) 

(19) 


( 20 ) 


4 Experiments on 1-Bit Coding and Estimation 

We conduct extensive simulations to (i) verify the 1-bit variance formulas of the MLE, and (ii) 
apply the 1-bit estimator in Algorithm [1] for one scan 1-bit compressed sensing [12] . 


4.1 Bias and Variance of the Proposed Estimators 

Figure m provides the simulations for verifying the 1-bit estimator Ao+ and its bias-corrected version 
Ao+,c using small a (i.e., 0.05). Basically, for each sample size n, we generate 10® samples from 
S{a^ 1), which are quantized according a pre-selected threshold C. Then we apply both Ao+ and 
Ao+,c and report the empirical mean square error (MSE = variance -|- bias^) from 10® repetitions. 
For thorough evaluations, we conduct simulations for a wide range of n G [5,1000]. 

The results are presented in log-log scale, which exaggerates the portion for small n and the 
y-axis for large n. The plots confirm that when n is not too small (e.g., n > 100), the bias of MLE 
estimate varnishes and the asymptotic variance formula (|12p matches the mean square error. For 
small n (e.g., n < 100), the bias correction becomes important. 

Note that when n is large (i.e., when errors are very small), the plots show some discrepancies. 
This is due to the fact that we have to use small a for the simulations but the estimators Ao+ 
and Ao+,c are based on a = 0-|-. The differences are very small and only become visible when the 
estimation errors are so small (due to the exaggeration of the log-scale). To remove this effect, we 
also conduct similar simulations for a = 1 and present the results in Figure (5] which does not show 
the discrepancies at large n. We can see that the bias-correction step is also important for a = 1. 

We should mention that, for numerical issue, we added a small real number (10“®) to ni. We 
did not further investigate various smoothing techniques as it appears that this Bartlett-correction 
procedure already serves the purpose well. 














Sample size n 



Sample size n 



Sample size n 



Sample size n 



Sample size n 



Sample size n 


Figure 4; Empirical Mean square errors of Ao+ (dashed curves) and Aq+^c (solid curves) from 10® 
simulations of S{a, 1) for a = 0.05, at each sample size n. Each panel present results for a different 
r/ = ^. Eor both estimators, the empirical MSEs converge to the theoretical asymptotic variances 
(fT^ (dashed dot curves and blue if color is available) when n is large enough. In each panel, the 
lowest curve (dashed dot and green if color is available) represents the theoretical variance using 
full (infinite-bit) information, i.e., 1/n in this case. For small n, the bias-correction step important. 
Note that the small (and exaggerated) discrepancies at large n are due to use of a = 0.05 to 
generate samples and the estimators based on a = 0-|-. Also recall that r] = 1.594 is the optimal rj. 





Sample size n Sample size n Sample size n 

Figure 5: Mean square errors of Ai (dashed curves) and Ai^c (solid curves) for a = 1. Note that 
the lowest curve (dashed dot and green if color is available) in each panel represents the optimal 
variance using full (i.e., infinite-bit) information, which is 2/n for a = 1. 


9 












































































































4.2 One Scan 1-Bit Compressed Sensing 

Next, we integrate Ao+,c into the sparse recovery procedure in Algorithm [H by replacing K with 
Ao+,c for computing Qf and Q~ [12]. We report the sign recovery errors \sgn{xi) — sgn{xi)\/K 
from 10^ simulations. In this study, we let N = 1000, K = 20, and sample the nonzero coordinates 
from A^(0,5^). For estimating K, we use n G {20, 50,100} samples with rj G {0.2, 0.5, 1.5, 2, 3}. 
Recall ry = 1.5 is close to be optimal (1.594) for Ao+. 

Figure [6] reports the sign recovery errors at 75% quantile (upper panels) and 95% quan¬ 
tile (bottom panels). The number of measurements for sparse recovery is chosen according to 
M = CAT log(A^/0.01), although we only use n G {20, 50,100} samples to estimate K. For compar¬ 
ison, Figure [6] also reports the results for estimating K using n full (i.e., infinite-bit) samples. 

When n = 100, except for r/ = 0.2 (which is too small), the performance of Aq+^c is fairly stable 
with no essential difference from the estimator using full information. The performance of Ao+,c 
deteriorates with decreasing n. But even for n = 20, Aq+^c sit g = 1.5 still performs well. 




















Figure 6: Sign recovery error: \sgn{xi) — sgn{xi)\/K, using Algorithm (D and estimated K in 

computing Qf and Q~ in Algorithm [TJ In this study, N = 1000, K = 20, and the nonzero entries 
are generated from A^(0, 5^). The number of measurements for recovery is M = ^A'log(A^/0.01) 
and we use n samples to estimate AT for n G {20, 50, 100}. We report 75% (upper panels) and 95% 
(bottom panels) quantiles of the sign recovery errors, from 10^ repetitions. We estimate AT using 
the full information (i.e., the estimator (l6|)) as well as 1-bit estimator Ao+,c with selected values of 
g G {0.2, 0.5, 1.5, 2, 3}. When n = 100, except for g = 0.2 (which is too small), the performance 
of Ao+,c is fairly stable with no essential difference from the estimator using full information. The 
performance of Ao+,c deteriorates with decreasing n. But even when n = 20, the performance of 
Ao+,c at g = 1.5 (which is close to be optimal) is still very good. Note that, when a curve does not 
show in the panel (e.g., n = 50, g = 3, and 95%), it basically means the error is too large to fit in. 
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5 2-Bit Coding and Estimation 


As shown by theoretical analysis and simulations, the performance of 1-bit coding and estimation 
is fairly good and stable for a wide range of threshold values. Nevertheless, it is desirable to further 
stabilize the estimates (and lower the variance) by using more bits. 

With the 2-bit scheme, we need to introduce 3 threshold values: Ci < C 2 < C^. We dehne 

Pi = Pr {Zj < Cl) = Fa {Cl/Ka) 

P 2 = Pr (Cl < Zj < C2) = Fa (C2/A„) - Fa {Ci/Aa) 

P 3 = Pr {C2 < Zj < C3) = Fa (Cs/Aa) - Fa (Cs/A^) 

P 4 = Pr {zj > C3) = 1 - Fa {Cs/Aa) 

and 

n n 

ni = ^ l{zj < Cl}, 712 = ^ l{Ci < Zj < C 2 ] 

1=1 1=1 

n n 

na = ^ 1{C2 < Zj < C3}, 714 = ^ l{zj > C3} 

1=1 1=1 

The log-likelihood of these n = ni + n 2 + observations can be expressed as 

I =711 log Pi -I- n 2 logp 2 -I- ns logps + 714 logp4 
=711 log Fa {Ci/Aa) + n2 log [Fa {C2/Aa) - Fa (Ci/A„)] + 
ns log [Fa (Cs/Aa) - Fa (C2/A„)] + ?14 log [1 - {Cs/Aa)] , 

from which we can derive the MLE and variance as presented in Theorem [3l 


Theorem 3 Given n i.i.d. samples Uj ~ S{a, 1), j = 1 to n, three thresholds 0 < Ci < C2 < C3, 
ni = < Cl}, 712 = Y)]=i HCi < Zj < C2}, ns = Y.]=i HC2 < Zj < Cs}, 114 = 

Ej=i 1-{^1 > Cs], and 


A, 


A, 


A. 


Vi = 


Oi 02 O 3 

the MLE, denoted by Aa, is the solution to the following equation: 

a Cifa{l/vi) , C 2 /a(l/r/ 2 ) -Ci/„(l/r/i) 

U =ni —--h 712- 


P3 = 


Ca (l/l?l) 


Fail/m) -Fa{l/rii) 


, C3/„(l/r?3)-C2/a(l/r?2) , -C3/„(l/%) 

+ ^3 TT-FTl —^- TT-r—, —r-h n4i 


Fa{l/ris) - Fa{l/P 2 ) 
The asymptotic variance of the MLE is 

A2 


I-Fa ( 1 / 773 ) 


Var [Aa] = — 14 ( 771 , 772 , 773 ) + 0 


n 




where the variance factor can be expressed as 

1 _ 1 /g (1/771) 1 fa (1/773) [fa i^/m) /V 2 " fa {l/Vl) /Vlf 

14(771,772,773) r/fFa (1/771) 7/1 1 - F„ (1/773) (1/772) - F„ (1/771) 

[fa (1/773) /V 3 - fa (1/772) / 7 ? 2 ]^ 

(1/773)(1/772) 
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The asymptotic bias is 


E ( Aq I — Aq ( 1 + — 


2nB‘^ / 


+ 0 {- 


where 


B = 


Ci^ f2 

h 




and 


D = 


+- 


+ 


-fel hS[ 


Aq: 


/2 - (] fl 


1 2 


iA — Ai 


+ 


.C3 

Ack 


/s - ( ) /2 


1 2 


Aq — Fo 


+ 


Cs V i-2 
1 — Aq 




Ai 

'A, 


Ai 

.Cs 

Aa 


+ 


/a “ (“ivr) /2 


C2 f/ / Cl '\ ft 

Ac i A I Ac i 21 


A 2 — Ai 

2 / ^ \ 2 


-&) /3-(-g) /2 

^ ^ - 


A3 — A2 1 — A3 

Proof: See Appendix rO. 

The asymptotic bias formula in Theorem [3] leads to a bias-corrected estimator 

A„ 


Aryc — 


1 1 D 

^ ^ nB TnW 


□ 


( 21 ) 


Note that, with a slight abuse of notation, we still use Aq, to denote the MLE of the 2-bit scheme 
and we rely on the number of parameters (e.g., rji, 772 , %) to differentiate Aq for different schemes. 


5.1 cr —i 0-|- 


In this case, we can slightly simplify the expression: 


Va{m,V2,r]3) 


1 


(pi-P2)^ I [m-mY I P 3 
e’7i_e’72 “T e''2-e’)3 “T e''3-l 


Numerically, the minimum of Ao+(^i) ^ 2 , %) is 1.122, attained at 771 = 3.365 ,772 = 1.771 ,773 = 0.754. 
The value 1.122 is substantially smaller than 1.544 which is the minimum variance coefficient of 
the 1-bit scheme. Figure [3 illustrates that, with the 2-bit scheme, the variance is less sensitive to 
the choice of the thresholds, compared to the 1 -bit scheme. 


In practice, there are at least two simple strategies for selecting the parameters 771 > 772 > 773 : 

• Strategy 1: First select a “small” 773 , then let 772 = tp^ and 771 = tT] 2 , for some f > 1. 

• Strategy 2: First select a “small” 773 and a “large” 771 , then select a “reasonable” 772 in between. 

See the plots for examples of the two strategies in Figure [3 We re-iterate that for the task of 
estimating Aq using only a few bits, we must choose parameters (thresholds) beforehand. While in 
general the optimal results are not attainable, as long as the chosen parameters fall in a “reasonable” 
range (which is fairly wide), the estimation variance will not be far away from the optimal value. 
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Figure 7 : Left (strategy 1 ): Fo+ (??i, ??2, "^s) for 772 = tr]^, rji = tri2, at t = 2 , 3 , 4 , with varying 773. 
Right (strategy 2 ): Vo+ for fixed 771 = 5 , 773 G { 0 . 5 , 0 . 75 , 1 }, and 772 varying between 773 and 771. 


5.2 a = l 

Numerically, the minimum of Fi(771,772,773) is 2 . 087 , attained at 771 = 1 . 927 , 772 = 1 . 000 , 773 = 0 . 519 . 
Note that the value 2.087 is very close to the optimal variance coefficient 2 using full information. 
Figure [S] plots the examples of Fi(771,772,773) for both “strategy 1 ” and “strategy 2 ”. 




Figure 8: Left (strategy 1 ): Vi (771,772,773) for 772 = tr]^, rji = trj2, at t = 2 , 3 , 4 , with varying 773. 
Right (strategy 2 ): Vi for fixed 771 = 3 , 773 G { 0 . 25 , 0 . 5 , 0 . 75 }, and 772 varying between 773 and 771. 


5.3 a = 2 

Numerically, the minimum of 1/2(771,772,773) is 2 . 236 , attained at 771 = 0 . 546 , 772 = 0 . 195 , 773 = 
0 . 093 . Figure 0 presents examples of F2(771,772,773) for both strategies for choosing 771, 772, and 773. 




Figure 9 : Left (strategy 1 ): Vi (771,772,773) for 772 = trj^, rji = t772, at t = 2 , 3 , 4 , with varying 773. 
Right (strategy 2 ): Vi for fixed 771 = 1 , 773 G { 0 . 05 , 0 . 1 , 0 . 2 }, and 772 varying between 773 and 771. 
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5.4 Simulations 


Figure [TUI presents the simulation results for verifying the 2 -bit estimator Ao+ and its bias-corrected 
version Ao+,c- For simplicity, we choose 773 G {0.05, 0.1, 0.25, 0.75, 1.5, 2} and we fix 772 = 8773 , 
771 = 3772 . Although these choices are not optimal, we can see from Figure [TO] that the estimators 
still perform well for such a wide range of 773 values. Compared to 1-bit estimators, the 2 -bit esti¬ 
mators are noticeably more accurate and less sensitive to parameters. Again, the bias-correction 
step is useful when the sample size n is not large. 

Similar to Figure S] we can observe some discrepancies at large n (as magnified by the log-scale 
of the y-axis). Again, this is because we simulate the data using a = 0.05 and we use estimators 
based on a = 0 -|-. To remove this effect, we also provide simulations for a = 1 in Figure fTTl 





Sample size n Sample size n Sample size n 

Figure 10 : Empirical Mean square errors of the 2-bit estimators: Ao+ (dashed curves) and Ao+,c 
(solid curves), for 10 ® simulations at each sample size n. We use a = 0.05 to generate stable samples 
S{a, 1 ) and we consider 6 different % = ^ values presented in 6 panels. We always let 772 = 8773 
and 771 = 8772 . For both estimators, the empirical MSEs converge to the theoretical asymptotic 
variances (I12h (dashed dot curves and blue if color is available) when n is not small. In each panel, 
the lowest curve (dashed dot and green if color is available) represents the theoretical variances 
using full (infinite-bit) information, i.e., 1/n in this case. When n is small, the bias-correction step 
important. Note that the small (and exaggerated) discrepancies at large n are due to the fact that 
we use a = 0.05 to simulate the data and use estimators based on a = 0 -I-. 
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Sample size n 



Sample size n 




Sample size n Sample size n 



Sample size n 



Sample size n 


Figure 11: Mean square errors of the 2-bit estimator Ai (dashed curves) and its bias-corrected 
version Ai^c (solid curves), for a = 1, by using 6 different % values (one for each panel) and fixing 
r /2 = 3ry3, 7?i = 3r]2. The lowest curve (dashed dot and green if color is available) in each panel 
represents the optimal variance using full information, which is 2/n for a = 1. 


5.5 Efficient Computational Procedure for the MLE Solutions 

With the 1-bit scheme, the cost for computing the MLE is negligible because of the closed-form 
solution. With the 2-bit scheme, however, the computational cost might be a concern if we try to 
find the MLE solution numerically every time (at run time). A computationally efficient solution 
is to tabulate the results. To see this, we can re-write the log-likelihood function 

I =!!ilogF« {l/rji) + — log [Fa (l/m) - Fa (1/t?i)] 

n n 

+ — log [Fa (1/%) - Fa im)] + - - + ^2 + ns) ^ 

n n 

This means, we only need to tabulate the results for the combination of nijn, n^jn^ n^jn (which all 
vary between 0 and 1). Suppose we tabulate T values for each rii/n (i.e., at an accuracy of 1/T), 
then the table size is only T^, which is merely 10® if we let T = 100. 

Here we conduct a simulation study for a = 1 and T S {20, 50, 100, 200}, as presented in 
Eigure[l2j We let % = 0.5, r ]2 = 8773 , rji = 8172 . We can see that the results are already good when 
T = 100 (or even just T = 50). This confirms the effectiveness of the tabulation scheme. 

Therefore, tabulation provides an efficient solution to the computational problem for hnding 
the MLE. Here, we have presented only a simple tabulation scheme based on uniform grids. It is 
possible to improve the scheme by using, for example, adaptive grids. 
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Sample size n Sample size n 

Figure 12: Mean square errors of the 2-bit tabulation-based estimator Ai (dashed curves) and its 
bias-corrected version Ai^c (solid curves), for a = 1 and T £ {20, 50, 100, 200} tabulation levels, 
by fixing 773 = 0.5, 772 = 3?73, rji = 8772 . The lowest curve (dashed dot and green if color is available) 
in each panel represents the optimal variance using full information, which is 2/n for a = 1. 


6 Multi-Bit (Multi-Partition) Coding and Estimation 

Clearly, we can extend this methodology to more than 2 bits. With more bits, it is more flexible 
to consider schemes based on {m + 1 ) partitions. For example 777 = 1 for the 1 -bit scheme, m = 3 
for the 2 -bit scheme, and m = 7 for the 3 -bit scheme. We feel 777 < 5 is practical. The asymptotic 
variance of the MLE A^ can be expressed as 

Var (^A«^ = ^ 14 ( 771 , ...,77m) + O , where 

1 ^ 1 /«(Vm) 1 fa (V^m) [fa j^/Vs+l) /Vs+1 " fa jl/Vs) /Vs? 

14 ( 771 ,....,77m) ?7fF„(l/?7i) ?7^ 1 - (l/77m) ^ F„(1/77s+i)( 1/774 

Here, we provide some numerical results for 777 = 5 , to demonstrate that using more parti¬ 
tions does further reduce the estimation variances and further stabilize the estimates in that the 
estimation accuracy is not as sensitive to parameters. 

6.1 a = O-b and m = 5 

Numerically, the minimum of Vo+ (^i; ^4; %) is 1 . 055 , attained at 771 = 4 . 464 , 772 = 2 . 871 , 773 = 

1 . 853 , 774 = 1 . 099 , 775 = 0 . 499 . Figure [l 3 ] (right panel) plots 14 +(771,772,773,774,775) for varying 775 
and rji = t77j+i, i = 4 , 3 , 2 , 1 . For comparison, we also plot (in the left panel) Vo+(^i)^2)%) for 
varying 773, and 772 = trj^, rji = trj2. We can see that with more partitions, the performance becomes 
significantly more robust. 
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Figure 13 : Left ( 4 -partition): Vo+ for varying 7^3 and 772 = tr]^, rji = trj2, at t = 2 , 3 , 4 . 

Right (6-partition): Vo+ (?7i, ??2, ??4, %) for varying 775 and 77* = t77i+i, at t = 2 , 3 , 4 . 

6.2 a = 1 and m = 5 

Numerically, the minimum of Vi (?7i, 772,773,774,775) is 2 . 036 , attained at 771 = 2 . 602 , 772 = 1 . 498 , 773 = 
1 . 001 , 774 = 0 . 668 , 775 = 0 . 385 . Figure [H] (right panel) plots Vi (771,772,773,774,775) for varying 775 
and 77* = t77j+i, i = 4 , 3 , 2 ,1. Again, for comparison, we also plot (in the left panel) Vi (771,772,773) 
for varying 773, and 772 = tr]^, 771 = ^772. Clearly, using more partitions stabilizes the variances even 
when the parameters are chosen less optimally. 




Figure 14: Left (4-partition): Vi (771,772,773) for varying 773 and 772 = tr]^, 771 = tri 2 , at t = 2,3,4. 
Right (6-partition): Vi (r?!, 7 ? 2 , ^74, %) for varying 775 and iji = tiji+i, at t = 2,3,4. 

6.3 a = 2 and m = 5 

Numerically, the minimum of V2 (r?i, f?2) ^73,774,775) is 2.106, attained at 771 = 0.893, 772 = 0.339, 773 = 
0.184, 774 = 0.111, 775 = 0.068. Figure [Ml fright panel) plots V2 (Mj ^72, r?4, %) for varying 775 and 

77i = t77i+i, 7 = 4, 3,2,1, as well as (left panel) V2 (t?!, 772,773) for varying 773, and 772 = t773, 771 = tr]2. 




Figure 15 : Left ( 4 -partition): 1/2(771,772,773) for varying 773 and 772 = ^773, 771 = t772, at t = 2 , 3 , 4 . 
Right ( 6 -partition): I/2 (Mj ^72, r? 4 , %) for varying 775 and 77^ = t77i+i, at t = 2 , 3 , 4 . 
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7 Extension and Future Work 


Previously, m used counts and MLE, for improving classical minwise hashing and 6-bit minwise 
hashing. In this paper, we focus on coding schemes for a-stable random projections on individual 
data vectors. We feel an important line of future work would be the study of coding schemes for 
analyzing the relation of two or multiple data vectors, which will be useful, for example, in the 
context of large-scale machine learning and efficient search/retrieval in massive data. 

For example, m considered nonnegative data vectors under the sum-to-one constraint (i.e., the 
li norm = 1). After applying Cauchy stable random projections separately on two data vectors, 
the collision probability of the two signs of the projected data is essentially monotonic in the 
similarity (which is popular in computer vision). Now the open question is that, suppose we do 
not know the li norms, how we should design coding schemes so that we can still evaluate the 
similarity (or other similarities) using Cauchy random projections. 

Another recent paper m re-visited classical Gaussian random projections (i.e., a = 2). By 
assuming unit I 2 norms for the data vectors, [l3] developed multi-bit coding schemes and estimators 
for the correlation between vectors. Can we, using just a few bits, still estimate the correlation if 
at the same time we must also estimate the I 2 norms? 

8 Conclusion 

Motivated by the recent work on “one scan 1-bit compressed sensing”, we have developed 1-bit 
and multi-bit coding schemes for estimating the scale parameter of a-stable distributions. These 
simple coding schemes (even with just 1-bit) perform well in that, if the parameters are chosen 
appropriately, their variances are actually not much larger than the variances using full (i.e., infinite- 
bit) information. In general, using more bits increases the computational cost or storage cost (e.g., 
the cost of tabulations), with the benefits of stabilizing the performance so that the estimation 
variances do not increase much even when the parameters are far from optimal. In practice, we 
expect the (m-|-l)-partition scheme, combined with tabulation, for m = 3, 4, or 5, should be overall 
preferable. Here m = 3 corresponds to the 2-bit scheme, m = 1 to the 1-bit scheme. 
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A Proof of Theorem [T] and Bias Corrections 

The log-likelihood of the n = ni n 2 observations is 


I =ni log Fa {C/Ka) + n 2 log [1 - (C'/A^)] 


and its first derivative is 


I' = 


dl 

dAa 


UjC/Ag) -fa jC/Ag) 

\ AlJ V’"' Fa (C/A,) ^ 1 - F, (C/A,) 

/ C \ / ni — nF\ 

[~Al) ['F^J 


For simplicity, we use F, /, /', /" for F, (C/A,), /, (C/A,), (C/A,) and /" (C/A,), respectively. 

Setting I' = 0 leads to the MLE solution: ^ = F, (C/A,), ie.. A, = 

_ _ Fa. 

According to classical statistical results [21 [20], 


F (A,) = 

Var (^Aa^ 


Aa - 


F(r)3 + F(/T) 


2/2 


= T + 0 ^ 




+ 0 



where the Fisher Information I = Here the derivatives I', I", V are with respect 

to A,. Thus, we need to computer the derivatives of I and evaluate their expectations. 

By property of binomial distribution, we have F(ni) = nFi and 


F(ni — F(ni)) = 0 

F(ni - E{ni)f = nF(l - F) 

F(ni - E{ni)f = nE{l - F)(l - 2F) 


Obviously 


Furthermore, 


E{l'f 

E{l'f 


E{1') 



nE — nF \ 
F-F2 ) 


f = 0 



f nF{l-F) \ /2 

V(F-F2)2; A4F(1-F) 

/nF(l - F)(l - 2F)\ _ C3 - 2F) 
V (F - F2)3 J ~ “”a6 F2(1 - F)2 


Next we work on the second derivative 


A, A2 


f 


ni — nF\ , 


F-F'^ J 




An 


c 


= -^^' + 4 -WwM'+ -TW 


/ V a 2 


c 


/ 


A2 / 1 - F 




^ f ( “^i/ + 2ni/F - nfF^ 
J y (F - F2)2 
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( CV (-nfF + 2nfF‘^-nfF‘^\_ f 

\M) ^ V (F-F^f ) ~ ~'^AiF{l-F) ~ 

and higher-order derivatives 



r = 




An 


Al 



f l-F 


_C 

'a^ 


nif 


F2(l -F) 


=Ar - ±1" + 

A2 A„ + 



+ I - 
+ 


_C 

A2 

4C^ 


I' 


,frf-{fr , fii-F) + f 


p 


+ 


A5 


nip 


F2(l -F) 


(1-F)2 

C \ ^ /2ni//'(F2 - f3) - nip{2F - 3F^) 


A2 


F4(1 _F)2 


Epl") = E 



f 


1 - F 



P 


F2(l -F) 


np 


E{nil') 


( C\(E{ni{ni-nF))\ ( C\(nE{l-E)\ 

V Al) V F-F2 \ Al)\ E-F^ r 



nf 


EilT) = (-£ + 


F 


f f 

f l-F 


P 

^A4F(1-F) 


_c 

Al 


np 


F2(l - F) 



np{l - 2F) 
F2(l - F)2 


2 

A„ 




A2 



Therefor, the bias-correction term is 

F(r)3 + F(rr) _(“i^ +(“i§:) (f)) _ Aa(^ + ^f)_ 


/2 


'‘aJfci-F) 

Acf ni / ni \ 2 


n n \ n J p 


n p P 
A^F(l-F) 


n 


F(l-F) 


where we denote z = - = Note that since I' = 0, we have z = -^ = Fp{ni/n), Fa{z) = 
Next, we derive more explicit expressions for a = 0-|-, a = 1, and a = 2. 

When a —)■ 0-I-, 

Fo+(z) = fo+{z) = fo+{z) = Fo"^( 2 ;) 


log Ijz 
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2 + 2 


fjz) 1 1 

fiz) z Fa^{ni/n) 


= log n/ni 


+ = 1.-2+ ^ l.-2/F-l(ni/n) ^ log^l/ni) 

^ ^ 2222 (n/ni)2 


£1(^)3+ E(/T) A„ni 


Jf 

(l _ ^ + ^7 _ Hi) 

n n \ n ) z'^f’^ n n \ n J logre/rj 


£2 


n / 22 / 


Ag (n/ni - 1 ) 
n logre/ni n logn/ni 


Recall that /' = 0 ^ ^ = Fg (C/Ag) ^ ^ ^{ni/n) = => 6 ^“/^- ^ 

Therefore, the bias-corrected MLE for a —)• 0+ is 

Clogn/rii 


Ao+,c — 


1 + 


1/ni — l/n 
2 log n/ni 


When a = 1, by properties of Cauchy distribution, we know 


Fi( 2 ) =-tan ^ 2 , 7 ( 2 ) = -—^, 
Vr TT i + 2 ^ 


m = 


2 —22 ^_1 , , TT 


/'(2) -222 
2 + 2+^ = 2 + 


fiz) 


1 + 22 1 + 22 


22/2(z) = 


7 r 2 (1 + 22)2 


Note that z = -^ = F^ ^(ni/n) = tan^^, tan ^ z = We have 


2 + 7r2 1 + 22 ^ 


vr 


2 / 1 

1 + 


tan2f^ 


+ E{l'l") Ag m 


/2 


— M _ 


n n 


2 + 2=^ 


Ag ni 


n / ^ 2/2 


n n 


ni\ TT 


= — 1 - — H 1 + 


n 


tan2fa 


Therefore, the bias-corrected MLE for a = 1 is 


Ai,c — 


tan 


C 

TT ''>-1 


1 +(1 _ (1 + 

n 4 n \ n J \ 


1 


tan 


2 TT ''^1' 

2 n 


When a = 2, since 5(2,1) ~ \/2 x A^(0,1), i.e., |sgp ~ 2 x 2 ^ we have 
F2(2) = F^2(2/2) = 2$(ll2)-l, 


F^^{t) = 2F-,\t) = 2 


2 = Fg ^(ni/n) = 2 


d>- 


( 1 ) 

/ni/n + 1 


h{z) = f^ 2 {z/ 2)/2 = 


2v^li72 


=-+4 _ 


=-+4 


2y/¥z 


f2iz) = - 


4y72^/2 


,-+4 




,-+4 


f^iz) 1 ^ 2^2(' \ 2 ; -zl2 

^H)="2-4’ "^^"^ = 4^" 
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E{l'f+ E{l'l") ^ 2 + 4 

P n n \ n ) 

Therefore, the bias-corrected MLE for a = 2 is 


(i _ a) 

n n \ n J 





‘2,c 


C 

2F~n(ni/n) 


I 7rna ('1 _ >11( 3 — 1 1 

- n n V-^ n J \ p-pni/n) ) 


F ^{nxln) 
e 


B Proof of Theorem [2] 


The task is to prove the following two bounds: 


Pr ( Aq > (1 -|- e)AQ,) < exp ( —n 


Pr ( Aq, < (1 - e)AQ,) < exp ( -n 


G'R,«,C,e 

GL,a,C,€ 


, e > 0 


, 0 < e < 1 


The proof is based on the expression of the MLE estimator Aq, = CjE~^{nijn), the fact that 
ni ~ Binomial{n,Ea{l/r])), and Chernoff’s original tail bounds [6] for the binomial distribution. 
For the right tail bound, we have 


Pr ( Aq > (1 -h e)A, 
C 


=Pr 

=Pr 

=Pr 

< 


Ea ^(u-i/n) 


> (1 + e)Ac 


n 

!+ 

n 


< Fa 

< Fa 


Fcxiih) 


c 

(1 -|- e)A, 

1 

{l + e)r] 

nFa{l/{l+e)r]) 


Fa{l/{l + e)ri) 


1 -Fq(1/7?) 

1 - Fq ( 1/(1 -he)r/)_ 


n-nFQ,(l/(l+e)? 7 ) 


= exp —n 


GR,a,C,t 


where 

GR^a,C,t 


= -Fa{l/{1 + e)r]) log 


FM/v) 


_Fail/{l + e)ij) 


- (1 - Fq ( 1/(1 -he)r/))log 


1 - Fqjl/r]) 

1 - Fail/il + €)r]) 
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Next, for the left tail bound, we have 


where 

^2 


Pr ( Aq, < (1 - e)A, 
C 


=Pr 


Fa ^ni/n) 


< (1 - e)A, 


C 


n V(l-e)A, 


Fail/r]) 


< 


= exp —n 


LF„(l/(l-6)ry) 
^2 


nFa{l/{l-e)r]) 


GL,a,C,e 


GL,a,C,( 


= -Fail/{1 - €)r]) log 


Fail/V) 


l-Fa{l/r]) 

l-Fail/{l-€)r]) 


n-nFa{l/{l-e)ri) 


[Fa{l/il-e)v) 


- (1 - Pa(l/(1 - e)?7))log 


1 - Fa{l/r]) 

1 - Fa{l/{1 - e)r])_ 


C Proof of Theorem [3] 


With the 2-bit scheme, we need to introduce 3 threshold values: Gi < C 2 < G 3 , and dehne 


Pi = Pr {Zj < Cl) = Fa (Gi/Aa) 

P2 = Pr (Cl < z,- < C2) = Fa {G2/Aa) - Fa (Ci/A„) 
P 3 = Pr (C2 < < C3) = Fa (G^/Aa) - Fa (Ca/A^) 
P4 = Pr {zj > C3) = 1 - Fa (C3/A0) 


and 


ni = ^ l{zj < Cl}, 

n 

ns = ^ 1 {C 2 < Zj < C 3 }, 


n2 = ^ l{Ci < Zj < C2} 

i=i 

n 

n4 = ^ l{zj > C3} 

j=i 


The log-likelihood of these n = rii + n 2 + observations can be expressed as 


I =ni log Pi -I- n 2 logp 2 + ns logps -|- n 4 logp 4 
=ni log Fa (Gi/Aa) + n 2 log [Fa (C 2 /Aa) - Fa (Ci/Aa)] 

+n 3 log [Fa (Cs/Aa) - Fa (C2/Aa)] -b n4 log [1 - Fa (Cs/Aa)] 
=ni log Fi -b n2 log(F2 - Fi) -b ns log(F3 - F2) -b n4 log(l - F3) 
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To seek the MLE of A„, we need to compute the first derivative: 


„ dl fa (Cl/Ac 

t — =ni- 


dAc 


Fa {Cl/Ac 


^ X fa{C2/Ac)(-^)-fa{Cl/Aa) 


Cl 

FI 


Fa {C2/Aa) - Fc {Ci/Aa) 


U (-t) - U (C,IA^) _f^ (CIA 

+^^3- ^ ^ , K N -^ + ^4 


Fa {Cz/Aa) - Fc {C2/Aa) 


=ni 


-%] fi 


+ n2 


-■& ) /2 - (-& ) /l 


+ 


Cs 

l-F„(C3/Aa) V 

S:) “ (“■§■ 


+ 714 - 


) /3 


Fi F2 — Fi F3 — F2 1 — -F3 

Since E{ni) = nFi, E{n2) = n{F2 — Ti), E{nf) = n(F3 — F2), E{n/i) = n(l — F3), we have 


Next, we compute the Fisher Information, 


i =ni 


+n2' 


+n3' 




A 


Ff 


(-t) - (-S)'/! (r. - U) - [(-£|) A - (-t) /i]' 


{F 2 -F 1 Y 


("S) ^3 "("ft) ^2 (As - i"2) - [(-g) /3 - (-§) 72 ] 


(As - F 2 Y 


+n4' 


(-t)A5 (1-^3)- [-(-§) a]' 


(1 - F^Y 


-I = E{ 1 ") = 




Ti 


+■ 


("ft) ^2 -(-ft) fi (A’2-A^i)- [(-ft)/2- (-ft) 


fl 


1 2 


+ ■ 


(F 2 - Fl) 

("ft) ^3 -(-ft) fi (A’3-A^2)- [(-ft)/3- (-ft) 


/2 


1 2 


(F3 - F2) 


+ ■ 


ft) fi 


(1 - F3) - 


C3 

'A^ 


/3 


(l-Fs) 


.Cl 


(/l)' 


C2 
' A^ 


/2 - ( ) fl 


1 2 


C3 

'A^ 


/3- -ft /2 


1 2 


C3 

'A^ 


/3 


1 2 


Fl 


(F2-F1) 


(F3 - F2) 


(l-A’ 3 ) 
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The asymptotic bias is 




For convenience, we re-write I' and I" as follows. 

Cl 


I' = [ni — nFi] 


-^T) fi (-^) f2 - fi 

+ [n2 - n(F2 - Fi)] 


+ [ns - n(F3 - F 2 )] 






F 3 — F 2 


2 = 1 


= [ni — nFi] 

+ [n2 - n{F2 - Fi)] 

+ K - n(F3 - F 2 )] 


'^Zip'Jpi, where Zi = 


Cl 

'Al 


Ff 



F2 — Fi 

+ [riA - 

- n(l - F3)] 

dpi 




El)- 

.(-ft)/2- 


^'/3 


1 — F3 


1 2 


(F 2 - Fi)2 


(“ S ) -^3 “(“ ft ) -^2 (^ 3 -^ 2 )- (-^)/ 3 - (-§)/2 


K - n(l - F3)] 

—n 


(F3 - F2)2 

“(“ft) -^3 “-^ 3 ) - - (-^) /s 


(1 - i"3)2 




Fi 


— n 


^V2-(-^)/l 


1 2 


2=1 


2^? 


(F2-F1) 

_ ,, where p" = 


— n- 




1 2 


(F3 - F2) 


— ri' 


^'/3 


(l-i"3) 


dAl 


We will take advantage of the central comments of multinomial: 


E{{ni - npif) = npi{l - pi) 

E{{ni - npi){nj - npj)) = -npipj {i / j) 

E{{ni - npif) = npi{l - Pi){l - 2pi) 

E{{ni - npif{nj - npj)) = -npiPj{l - 2 pi) {i / j) 

E{{ni - npi){nj - npj){nk - npk)) = 2 npiPjPk {i 7 ^ j 7 ^ k) 


and the following expansion, 

{a + h + c + df =a^ + b^ + c^ + d^ + 2 >a% + 3a^c -b Sa^d + 2 >ab^ + 3b^c + 3b^d + 3ac^ + 36c^ 
+ 3 c^d + 3 od^ + 3bd? + 3 cd^ -|- 6a6c + 6 abd + Qacd + 6 bcd 
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We are now ready to compute Because 


4 

I ' = 

i=\ 


where Zi = rii — npi, p[ 


dpi 

dAa 


we need to compute 


and 


Thus 


Pi 


P2 


P3 


Pi' 


Zl— ] [ Z 2 - \-Z 3 - \-Z 4 — 


PlJ V P2 

in' '|2 

= - npi{l - 2 pi)z^^ 
Pi 


P3 


Pi. 


P2 P3 Pi 


in' '12 

= - n(l - 2pi)-^^ {P2 +P3+ p' 4 ) 


=n(l - 2 pi) 


Pi 

Ml! 

Pi 


E{zAz 2 ^X 3 ^^ 

Pi P2 P3 


= 2 np'^p' 2 P 3 


^ in')^ in')^ 

E{l'f =Y,nil-pi){l-2p,)^ + 3Y,n{l-2pi)^^'^ 


2 = 1 


PI 


2 = 1 


Pi 


E\2np)pp2P3 + 12np'iP2P4 + 1271 ^ 2 ^ 3^4 + Unp'ip^p'^^ 


^ ~ + 12npiP2P4 + 12np2P3P4 + \2nppp3p)^ 

i=i i=i 


Next, we compute E {I'l"). 


E {I'l") =E 


E^ailE 


Zi- 


PiPi -{P'i? 


. \ 2=1 
4 


Pi 


\ 2=1 


p 1 


=E 


^ ^2 PiP'iPi - jp'i)^ ^ 

i¥=j 


p { p '' Pj -{ p ' j )^ 

Pi P] 


2 = 1 


Pi 


ZiZj 


=71 




2 = 1 
4 


PI 


2 ^p^- -^ 


'ST- P"p'iPi - (p'i)^ 

=71 > —2 - - - n 

2 = 1 2 = 1 




2 = 1 


Pj 

Ml! 

Pi 


— 71 


Y.p'ip'j+^Y. 


" p'M)' 




*7^i 


Pi 


Pi Pi 

=71 > —^^ — 71 


2 = 1 


Pi 


E^+»(Ej>:)(E 


i=i P '^ 


V2 = l 


V. 2 = 1 


(M' 

Pi 


— n 


EK E 


Pi 


V. 2=1 


^ 2=1 


Pi Pi ST 


(P')' 


2 = 1 


Pi 
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4 // / 

E {I'I") + E {I'f =n^ - 4n^(p')^ + Unp'^p^p'^ + Unp^p^p^ + Unp^p^p'^^ + Unp'^p^p'^^ 


2=1 


Pi 


2=1 




2 = 1 


Pi 


To see this, we can use the fact that 1 = Yli=iPij 0 = Yli=iPiJ 9’^'^ 


^2=1 


2 = 1 


® ='^iPi)^ + ^'^(■Pi)'^(-Pi) + ^PlP2P3 + 6p'iP2P4 + 6 P 2 P 3 P 4 + ^PiPsPa 

=1 i=l 

4 

2 '^(p'i)^ + ^p'ipWs + QPiP 2 Pi + 6P2P3P4 + ^p'iP'sP'a 


2=1 


Therefore, 


E{l'l'') +E{l'f = nY^ 


2=1 


PiP'i 

Pi 


Cl 

'Al 


fl 


=n- 


+n- 


-t) 


C 2 


Fl 

f2-(-^)fl 


2^/1-(-§)'/{ 


F 2 — Fl 


+n- 


C 3 

'A^ 


/s “ (—^]f2 


-%:) /3-2§;/2- (-^) /; 


Ck i f' 


F 3 — F 2 


C 3 

AH' 


+n- 


/s 


@) A 


1 — F 3 


Because 

I =n 
and 

we have 


Cl 

'W 


(hf 


Fl 


+ n- 


C 2 

'7J 


f2 - (] fl 


1 2 


+ n- 


-%r ) /s - () /2 


.C 2 


1 2 


(F 2 -F 1 ) (F 3 -F 2 ) 

F (/'/") + F (/')^ A„ / 1 D 


+ U' 




1 2 


(1 - i^s) 


2/2 


F (A„ ) = A„ - ^ ^ ^ ^ 


n 


+ 


B 2F2 


+ 0 


n 




F ^ 2F2 



n 


— Aq ( 1 + 


1 D \ 


nB 2nB‘^ 


+ 0 



n 


which leads to a bias-corrected estimator 


^a,c — 


A. 


1 I 1 _ D 

nB 2 nB^ 
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